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^ A tree a lg orithm for ne arest neig hb or sea r ch i n g i n document ret ri e va l systems Q 
Caroline M. Eastman, Stephen F. Weiss 

May 1978 ACM SIGIR Forum , Proceedings of the 1st annual international ACM SIGIR 

conference on Information storage and retrieval SIGIR '78, volume i3 issue i 
Publisher: ACM Press 

Additional Information: full citation , abstract , references , citings , index 
terms 



Full text available: g pdf(651 .08 KB) 



The problenri of finding nearest neighbors to a query in a document collection is a special 
case of associative retrieval, In which searches are performed using more than one key. A 
nearest neighbors associative retrieval algorithm, suitable for document retrieval using 
similarity matching, is described. The basic structure used is a binary tree, at each node a 
set of keys (concepts) is tested to select the most promising branch. Backtracking to 
initially rejected branches is allowed and ofte ... 

Hierarchic document classification using Ward's clustering method 
A. El-Hamdouchi, P. Willett 

September 1986 Proceedings of the 9th annual international ACM SIGIR conference 
on Research and development in information retrieval 

Publisher: ACM Press 

Full text available: ^ pdf(974.30 KB) Additional Information: full citation , abstract , references , citings 

In this paper, we discuss the application of a recent hierarchic clustering algorithm to the 
automatic classification of files of documents. Whereas most hierarchic clustering 
algorithms involve the generation and updating of an inter-object dissimilarity matrix, this 
new algorithm is based upon a series of nearest neighbor searches. Such an approach Is 
appropriate to several clustering methods, including Ward's method which has been 
shown to perform well in experimental studies of hierarch ... 



Database techniques for archival of solid models 

David McWherter, Mitchell Peabody, All C. Shokoufandeh, William Regli 

May 2001 Proceedings of the sixth ACM symposium on Solid modeling and 

applications 
Publisher: ACM Press 

Additional Information: full citation , abstract , references , citin gs, index 
terms 



Full text available:" 



This paper presents techniques for managing solid models in modern relational database 
management systems. Our goal is to enable support for traditional database operations 
(sorting, distance metrics, range queries, nearest neighbors, etc) on large databases of 
solid models. As part of this research, we have developed a number of novel storage and 
retrieval strategies that extend the state-of-the-art in database research as well as 
change the way in which solid modeling software developers an ... 



Keywords: database clustering, database indexing, geonnetric reasoning, shape 
similarity, solid modeling 
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Poly-logarithmic deterministic fully-dynamic al g orithms for connectivity, mininnum 
spanning tree, 2-edqe, and biconnectivity 
Jacob Holm, Kristian de Lichtenberg, Nikkei Thorup 
July 2001 Journal of the ACM (JACM), volume 48 issue 4 

Publisher: ACM Press 

Full text available* Ddf(378 20 KB) Additional Information: full citation , abstract , references , citings , index 
.^^J-*^—* •- terms 

Deterministic fully dynamic graph algorithms are presented for connectivity, minimum 
spanning tree, 2-edge connectivity, and biconnectivity. Assuming that we start with no 
edges in a graph with n vertices, the amortized operation costs are OClog^ n) for 
connectivity, ©(log"* n) for minimum spanning forest, 2-edge connectivity, and 0(log^ n) 
biconnectivity. 

Keywords: 2-edge connectivity, Biconnectivity, connectivity, dynamic graph algorithms, 
minimum spanning tree 
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Real-time shadin g 

Marc Olano, Kurt Akeley, John C. Hart, Wolfgang Heidrich, Michael McCool, Jason L Mitchell, 
Randi Rost 

August 2004 Proceedings of the conference on SIGGRAPH 2004 course notes GRAPH 
'04 

Publisher: ACM Press 

Full text available: ^ pdf(7.39 MB) Additional Information: full citation , abstract 

Real-time procedural shading was once seen as a distant dream. When the first version of 
this course was offered four years ago, real-time shading was possible, but only with one- 
of-a-kind hardware or by combining the effects of tens to hundreds of rendering passes. 
Today, almost every new computer comes with graphics hardware capable of interactively 
executing shaders of thousands to tens of thousands of instructions. This course has been 
redesigned to address today's real-time shading capabili ... 

Level set and PDE methods for computer graphics 

David Breen, Ron Fedkiw, Ken Museth, Stanley Osher, Guillermo Sapiro, Ross Whitaker 
August 2004 Proceedings of the conference on SIGGRAPH 2004 course notes GRAPH 
■04 

Publisher: ACM Press 

Full text available: ^ pdf(17.07 MB) Additional Information: full citation , abstract 

Level set methods, an important class of partial differential equation (PDE) methods, 
define dynamic surfaces implicitly as the level set (iso-surface) of a sampled, evolving nD 
function. The course begins with preparatory material that introduces the concept of using 



partial differential equations to solve problems in computer graphics, geometric modeling 
and computer vision. This will include the structure and behavior of several different types 
of differential equations, e.g. the level set eq ... 

Document expansion for speech retrieval 
Amit Singhal, Fernando Pereira 

August 1999 Proceedings of the 22nd annual international ACM SIGIR conference on 
Research and development in information retrieval 

Publisher: ACM Press 

Fuli text available: ^ pdf(253.45 KB) Additional Information: full citation , references , citings , index terms 



Point-based computer graphics 

Marc Alexa, Markus Gross, Mark Pauly, Hanspeter Pfister, Marc Stamnninger, Matthias 
Zwicker 

August 2004 Proceedings of the conference on SIGGRAPH 2004 course notes GRAPH 
'04 

Publisher: ACM Press 

Full text available: ^ pdf (8.94 MB) Additional Information: full citation , abstract 

This course Introduces points as a powerful and versatile graphics primitive. Speakers 
present their latest concepts for the acquisition, representation, modeling, processing, 
and rendering of point sampled geometry along with applications and research directions. 
We describe algorithms and discuss current problems and limitations, covering important 
aspects of point based graphics. 



An optimal algorithm for approximate nearest neighbor searching fixed dinnensions 
Sunll Arya, David M. Mount, Nathan S. Netanyahu, Ruth Silverman, Angela Y. Wu 
November 1998 Journal of the ACM (3ACM), volume 45 issue 6 
Publisher: ACM Press 

Full text available: « Ddf(287.94 KB ) Additional Information: fujlcjtaloji. abstract, references, dtings, index 

terms 

Consider a set of S of n data points in real d-dimensional space, Rd, where distances are 
measured using any Minkowski metric. In nearest neighbor searching, we preprocess S 
into a data structure, so that given any query point q e Rd, is the closest point of S to q 
can be reported quickly. Given any po ... 

Keywords: approximation algorithms, box-decomposition trees, closet-point queries, 
nearest neighbor searching, post-office problem, priority search 



Searching in metric spaces by s patial ap proximation 
Gonzalo Navarro 

August 2002 The VLDB Journal ^ The International Journal on Very Large Data 

Bases, volume 11 Issue 1 
Publisher: Springer-Verlag New York, Inc. 

Full text available: ^ pdf(281.75 KB) Additional Information: full citation, abstract, citings , index terms 

We propose a new data structure to search in metric spaces. A metric space is formed by 
a collection of objects and a distance function defined among them which satisfies the 
triangle inequality. The goal Is, given a set of objects and a query, retrieve those objects 
close enough to the query. The complexity measure is the number of distances computed 
to achieve this goal. Our data structure, called sa-tree ("spatial approximation tree"), is 
based on approaching ... 

Keywords: Multimedia databases, Similarity or proximity search. Spatial and 
multidimensional search, Spatial approximation tree 



Interactive Editing Systems: Part II 



Norman Meyrowitz, Andries van Dam 

September 1982 ACM Computing Surveys (CSUR), volume i4 issue 3 
Publisher: ACM Press 

Full text available: ^ pdfO. 17 MB) Additional Information: full citation , references , citings , index terms 



External memor y alg orithms and data structures: dealing with massive data 
Jeffrey Scott Vitter 

June 2001 ACM Computing Surveys (CSUR), volume 33 issue 2 
Publisher: ACM Press 

Full text available: ^pdf(828.46 KB) Additional Information: full citation , abstract, references, citiQgs, index 
^^^r'"-^ terms 

Data sets in large applications are often too massive to fit completely inside the 
computers internal memory. The resulting Input/output communication (or I/O) between 
fast internal memory and slower external memory (such as disks) can be a major 
performance bottleneck. In this article we survey the state of the art in the design and 
analysis of external memory (or EM) algorithms and data structures, where the goal is to 
exploit locality in order to reduce the I/O costs. We consider a varie ... 

Keywords: B-tree, I/O, batched, block, disk, dynamic, extendible hashing, external 
memory, hierarchical memory, multidimensional access methods, multilevel memory, 
online, out-of-core, secondary storage, sorting 



Research sessions: clus tering: Incremental and effective data summarization for 

dynamic hierarchical clustering 

Samer Nassar, Jorg Sander, Corrine Cheng 

June 2004 Proceedings of the 2004 ACM SIGMOD international conference on 
Management of data 

Publisher: ACM Press 

Full text available: pdf(235.15 KB) Additional Information: full citation , abstract , references 

Mining informative patterns from very large, dynamically changing databases poses 
numerous Interesting challenges. Data summarizations (e.g., data bubbles) have been 
proposed to compress very large static databases Into representative points suitable for 
subsequent effective hierarchical cluster analysis. In many real world applications, 
however, the databases dynamically change due to frequent insertions and deletions, 
possibly changing the data distribution and clustering structure over time. ... 

Keywords: clustering, data summarization, incremental data bubbles 



The elements of nature: interactive and realistic techniq ues 

Oliver Deusen, David S. Ebert, Ron Fedkiw, F. Kenton Musgrave, Przemyslaw Prusinkiewicz, 
Doug Roble, Jos Stam, Jerry Tessendorf 

August 2004 Proceedings of the conference on SIGGRAPH 2004 course notes GRAPH 
'04 

Publisher: ACM Press 

Full text available: ^pdf(17.65 MB) Additional Information: full citation , abstract 

This updated course on simulating natural phenomena will cover the latest research and 
production techniques for simulating most of the elements of nature. The presenters will 
provide movie production, interactive simulation, and research perspectives on the 
difficult task of photorealistic modeling, rendering, and animation of natural phenomena. 
The course offers a nice balance of the latest interactive graphics hardware-based 
simulation techniques and the latest physics-based simulation techni ... 



Index-driven similarity search in metric spaces 
Gisli R. Hjaltason, Hanan Samet 

December 2003 ACM Transactions on Database Systems (TODS), volume 28 issue 4 



Publisher: ACM Press 

.- iix ^ I ui 01 ^r/ocn ixnx Additional Information: full citation , abstract , references , citings , index 

Full text available: TT pdf(650.64 KB) ' ' 

^'^^^^'^ ternns 

Similarity search is a very innportant operation in nnultinnedia databases and otiier 
database applications involving complex objects, and involves finding objects in a data set 
S similar to a query object q, based on some similarity measure. In this article, we focus 
on methods for similarity search that make the general assumption that similarity is 
represented with a distance metric d. Existing methods for handling similarity search in 
this setting typically fall into one of ... 

Keywords: Hiearchical metric data structures, distance-based indexing, nearest neighbor 
queries, range queries, ranking, similarity searching 



The strin g edit distance nnatch i n g pr oblem with moves 
Graham Cormode, S. Muthukrishnan 

January 2002 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete 
algorithms 

Publisher: Society for Industrial and Applied Mathematics 

Full text available: ^ pdf(1.13 MB) Additional Information: full citation , abstract , references , citings 

The edit distance between two strings S and R is defined to be the minimum number of 
character inserts, deletes and changes needed to convert R to S. Given a text string t of 
length n, and a pattern string p of length m, informally, the string edit distance matching 
problem is to compute the smallest edit distance between p and substrings of t A well 
known dynamic programming algorithm takes time 0{nm) to solve ... 

Li nk- b a sed sinnilar it y: LSH forest: self-tuning indexes for sinnilarity search 
Mayank Bawa, Tyson Condie, Prasanna Ganesan 

May 2005 Proceedings of the 14th international conference on World Wide Web 

Publisher: ACM Press 

Full text available: ^ pd f( 247.91 K B) Additional Information: fu l l c ita tion , a b s tr a c t, r eferences , index terms 

We consider the problem of indexing high-dimensional data for answering (approximate) 
similarity-search queries. Similarity indexes prove to be important in a wide variety of 
settings: Web search engines desire fast, parallel, main-memory-based indexes for 
similarity search on text data; database systems desire disk-based similarity indexes for 
high-dimensional data, including text and images; peer-to-peer systems desire distributed 
similarity indexes with low communication cost. We propose an i ... 

Keywords: peer-to-peer (P2P), similarity indexes 



15 Distance browsin g in spatial databases 
^ Gisll R. Hjaltason, Hanan Samet 

June 1999 ACM Transactions on Database Systems (TODS), volume 24 issue 2 

Publisher: ACM Press 

I- 111 ^ I ui 01 ^r/.i£>n i^r»\ Additional Information: full citation , abstract , references , citings, index 

Full text available: ITj pdf(460.81 KB) — ^ 

^ terms 

We compare two different techniques for browsing through a collection of spatial objects 
stored in an R-tree spatial data structure on the basis of their distances from an arbitrary 
spatial query object. The conventional approach is one that makes use of a k-nearest 
neighbor algorithm where k is known prior to the invocation of the algorithm. Thus if m < 
k neighbors are needed, the k-nearest neighbor alg ... 

Keywords: R-trees, distance browsing, hiearchical spatial data structures, nearest 
neighbors, ranking 
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Crowd and group animation 



Daniel Thalmann, Christophe Hery, Seth Lippman, Hiromi Ono, Stephen Regelous, Douglas 
Sutton 

August 2004 Proceedings of the conference on SIGGRAPH 2004 course notes GRAPH 
•04 

Publisher: ACM Press 

Full text available: ^ pdf(20.19 MB) Additional Information: full citation , abstract 

A continuous challenge for special effects in movies is the production of realistic virtual 
crowds, in terms of rendering and behavior. This course will present state-of-the-art 
techniques and methods. The course will explain in details the different approaches to 
create virtual crowds: particle systems with flocking techniques using attraction and 
repulsion forces, copy and pasting techniques, agent-based methods. The architecture of 
software tools will be presented including the MASSIVE softwa ... 

17 RCV1: A New Benchmark Collection for Te xt Categorization Research 
David D. Lewis, Yiming Yang, Tony G. Rose, Fan Li 
December 2004 The Journal of Machine Learning Research, volume 5 
Publisher: MIT Press 

Full text available: ^ pdf(6 28.29 K B) Additional Information: full citation , abstrac t, citing s, index terms 

Reuters Corpus Volume I (RCVl) is an archive of over 800,000 manually categorized 
newswire stories recently made available by Reuters, Ltd. for research purposes. Use of 
this data for research on text categorization requires a detailed understanding of the real 
world constraints under which the data was produced. Drawing on interviews with Reuters 
personnel and access to Reuters documentation, we describe the coding policy and quality 
control procedures used in producing the RCVl data, the Inten ... 

Level II technical support in a distributed computing environment 
Tim Leehane 

September 1996 Proceedings of the 24th annual ACM SIGUCCS conference on User 

services 
Publisher: ACM Press 

Full text available: ^ pdf(5.73 IVIB) Additional Information: full citation , references , index terms 





Automated hoardin g for mobile com puters 
Geoffrey H. Kuenning, Gerald J. Popek 

October 1997 ACi^ SIGOPS Operating Systems Review , Proceedings of the sixteenth 
ACM symposium on Operating systems principles SOSP '97, volume 3i issue 

5 

Publisher: ACM Press 

Full text available: ^.pdf (2.Q5 MB) Additional Information: full citation , references , citings , index terms 




20 Similarity queries I: Robust and efficient fuzzy match for online data cleaning 
Surajit Chaudhuri, Kris Ganjam, Venkatesh Ganti, Rajeev Motwani 
June 2003 Proceedings of the 2003 ACM SIGMOD international conference on 

Management of data 
Publisher: ACM Press 

Full text available' S pdf(271.47 KB) Additional Information: full citation , abstract , references , citings , index 

terms 

To ensure high data quality, data warehouses must validate and cleanse incoming data 
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^ ABSTRACT 



Consider a set of S of n data points in real d-dimenslonal space, Rd, where distances are measured 
using any Minkowski metric. In nearest neighbor searching, we preprocess S into a data structure, so 
that given any query point q e Rd, is the closest point of S to q can be reported quickly. Given any 
positive real &egr;, data point p is a (1 +&egr;)-approximate nearest neighbor of q if Its distance 
from q is within a factor of (1 + &egr;) of the distance to the true nearest neighbor. We show that it 
is possible to preprocess a set of n points in Rd in 0(dn log n) time and 0(dn) space, so that given a 
query point q € Rd, and &egr; > 0, a (1 + 8tegr;)-approximate nearest neighbor of q can be 
computed In 0(cd, &egr; log n) time, where cd,&egr;<d 1 + 6d/e;d Is a factor depending only on 
dimension and Stegr;. In general, we show that given an Integer k ^ 1, (1 + &egr;)-approxlmatlons 
to the k nearest neighbors of q can be computed in additional 0(kd log n) time. 
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Large repositories of 3D data are rapidly becoming available in several fields, including 
mechanical CAD, molecular biology, and computer graphics. As the number of 3D models 
grows, there is an increasing need for computer algorithms to help people find the 
interesting ones and discover relationships between them. Unfortunately, traditional text- 
based search techniques are not always effective for 3D models, especially when queries 
are geometric in nature (e.g., find me objects that fit into thi ... 
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A new means of evaluating the cluster hypothesis is introduced and the results of such an 
evaluation are presented for four collections. The results of retrieval experiments 
comparing a sequential search, a cluster-based search, and a search of the clustered 
collection in which individual documents are scored against the query are also presented. 
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Document clustering has not been well received as an infornnation retrieval tool. 
Objections to its use fall into two main categories: first, that clustering is too slow for 
large corpora (with running time often quadratic in the number of documents); and 
second, that clustering does not appreciably improve retrieval. We argue that these 
problems arise only when clustering is used in an attempt to improve conventional search 
techniques. However, looking at clustering as an informa ... 
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The problem of searching the elements of a set that are close to a given query element 
under some similarity criterion has a vast number of applications in many branches of 
computer science, from pattern recognition to textual and multimedia information 
retrieval. We are interested in the rather general case where the similarity criterion 
defines a metric space, instead of the more restricted case of a vector space. Many 
solutions have been proposed in different areas, in many cases without cros ... 
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We present a novel approach to pseudo-feedback-based ad hoc retrieval that uses 
language models induced from both documents and clusters. First, we treat the pseudo- 
feedback documents produced in response to the original query as a set of pseudo-query 
that themselves can serve as input to the retrieval process. Observing that the documents 
returned in response to the pseudo-query can then act as pseudo-query for subsequent 
rounds, we arrive at a formulation of pseudo-query-based retrieval ... 
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The exponential growth of data demands scalable infrastructures capable of indexing and 
searching rich content such as text, music, and images. A promising direction is to 
combine information re-trieval with peer-to-peer technology for scalability, fault- 
tolerance, and low administration cost. One pioneering work along this di-rection is 
pSearch [32, 33]. pSearch places documents onto a peer-to- peer overlay network 
according to semantic vectors produced using Latent Semantic Indexing (LSI). The ... 
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A large fraction of the useful web comprises of specification documents that largely 
consist of hattribute name, numeric valuei pairs embedded in text. Examples include 
product information, classified advertisements, resumes, etc. The approach taken in the 
past to search these documents by first establishing correspondences between values and 
their names has achieved limited success because of the difficulty of extracting this 
information from free text. We propose a new approach that does not r ... 

Real-time shading 

Marc Olano, Kurt Akeley, John C. Hart, Wolfgang Heidrich, Michael McCool, Jason L Mitchell, 
Randi Rost 

August 2004 Proceedings of the conference on SIGGRAPH 2004 course notes GRAPH 
'04 

Publisher: ACM Press 

Full text available: ^ pdf(7.39 MB) Additional Information: full citation, abstract 

Real-time procedural shading was once seen as a distant dream. When the first version of 
this course was offered four years ago, real-time shading was possible, but only with one- 
of-a-kind hardware or by combining the effects of tens to hundreds of rendering passes. 
Today, almost every new computer comes with graphics hardware capable of interactively 
executing shaders of thousands to tens of thousands of instructions. This course has been 
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Automatic categorization is the only viable nnethod to deal with the scaling problem of the 
World Wide Web. In this paper, we propose a Web page classifier based on an adaptation 
of k-Nearest Neighbor (k-NN) approach. To improve the performance of k-NN approach, 
we supplement k-NN approach with a feature selection method and a term-weighting 
scheme using markup tags, and reform document-document similarity measure used in 
vector space model. In our experiments on a Korean commercial Web direct ... 
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k is the nnost important parameter in a text categorization system based on the /c-nearest 
neighbor algorithm (/rNN). To classify a new document, the /r-nearest documents in the 
training set are determined first. The prediction of categories for this document can then 
be made according to the category distribution among the k nearest neighbors. Generally 
speaking, the class distribution in a training set is not even; some classes may have more 
samples than others. ... 
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Most previous work on the recently developed language-modeling approach to information 
retrieval focuses on document-specific characteristics, and therefore does not take into 
account the structure of the surrounding corpus. We propose a novel algorithmic 
framework in which information provided by document-based language models is 
enhanced by the incorporation of information drawn from clusters of similar documents. 
Using this framework, we develop a suite of new algorithms. Even t ... 
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The success of text-based retrieval motivates us to investigate analogous tecliniques 
which can support the querying and browsing of image data. However, Images differ 
significantly from text both syntactically and semantically In their mode of representing 
and expressing information. Thus, the generalization of information retrieval from the text 
domain to the image domain is non-trlvlal. This paper presents a framework for 
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Syntactic phrase indexing and term clustering have been widely explored as text 
representation techniques for text retrieval. In this paper we study the properties of 
phrasal and clustered indexing languages on a text categorization task, enabling us to 
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The graphics processor (GPU) on today's commodity video cards has evolved into an 
extremely powerful and flexible processor. The latest graphics architectures provide 
tremendous memory bandwidth and computational horsepower, with fully programmable 
vertex and pixel processing units that support vector operations up to full IEEE floating 
point precision. High level languages have emerged for graphics hardware, making this 
computational power accessible. Architecturally, CPUs are highly parallel s ... 
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Stemming and lemmatization were compared in the clustering of Finnish text documents. 
Since Finnish is a highly inflectional and agglutinative language, we hypothesized that 
lemmatization, involving splitting of the compound words, would be more appropriate 
normalization approach than the straightforward stemming. The relevance of the 
documents were evaluated with a four-point relevance assessment scale, which was 
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Large repositories of 3D data are rapidly becoming available in several fields, including 
mechanical CAD, molecular biology, and computer graphics. As the number of 3D models 
grows, there is an increasing need for computer algorithms to help people find the 
interesting ones and discover relationships between them. Unfortunately, traditional text- 
based search techniques are not always effective for 3D models, especially when queries 
are geometric in nature (e.g., find me objects that fit into thi ... 
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A new means of evaluating the cluster hypothesis is introduced and the results of such an 
evaluation are presented for four collections. The results of retrieval experiments 
comparing a sequential search, a cluster-based search, and a search of the clustered 
collection in which individual documents are scored against the query are also presented. 
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Document clustering has not been well received as an information retrieval tool. 
Objections to its use fall into two main categories: first, that clustering is too slow for 
large corpora (with running time often quadratic in the number of documents); and 
second, that clustering does not appreciably improve retrieval. We argue that these 
problems arise only when clustering is used in an attempt to improve conventional search 
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The problem of searching the elennents of a set that are close to a given query element 
under some similarity criterion has a vast number of applications in many branches of 
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We present a novel approach to pseudo-feedback-based ad hoc retrieval that uses 
language models induced from both documents and clusters. First, we treat the pseudo- 
feedback documents produced in response to the original query as a set of pseudo-query 
that themselves can serve as input to the retrieval process. Observing that the documents 
returned in response to the pseudo-query can then act as pseudo-query for subsequent 
rounds, we arrive at a formulation of pseudo-query-based retrieval ... 
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The exponential growth of data denriands scalable infrastructures capable of indexing and 
searching rich content such as text, music, and images. A promising direction is to 
combine information re-trieval with peer-to-peer technology for scalability, fault- 
tolerance, and low administration cost. One pioneering work along this di-rection is 
pSearch [32, 33], pSearch places documents onto a peer-to- peer overlay network 
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A large fraction of the useful web comprises of specification documents that largely 
consist of hattribute name, numeric valuei pairs embedded in text. Examples include 
product information, classified advertisements, resumes, etc. The approach taken in the 
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Real-time procedural shading was once seen as a distant dream. When the first version of 
this course was offered four years ago, real-time shading was possible, but only with one- 
of-a-kind hardware or by combining the effects of tens to hundreds of rendering passes. 
Today, almost every new computer comes with graphics hardware capable of interactively 
executing shaders of thousands to tens of thousands of instructions. This course has been 
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Automatic categorization is the only viable method to deal with the scaling problem of the 
World Wide Web. In this paper, we propose a Web page classifier based on an adaptation 
of k-Nearest Neighbor (k-NN) approach. To improve the performance of k-NN approach, 
we supplement k-NN approach with a feature selection method and a term-weighting 
scheme using markup tags, and reform document-document similarity measure used in 
vector space model. In our experiments on a Korean commercial Web direct ... 
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